Documentation Index
Fetch the complete documentation index at: https://docs.platform.qubrid.com/llms.txt
Use this file to discover all available pages before exploring further.
OpenAI · Chat / LLM · 121.7B Parameters · 256K Context

Function Calling Tool Calling Streaming Reasoning Agent Workflows Long Context Code
Overview
Introducing gpt-oss-120b, OpenAI’s flagship open-weight model in the gpt-oss series, built for advanced reasoning, large-scale agentic workloads, and enterprise-grade automation. With 120B parameters and a highly optimized Mixture-of-Experts (MoE) architecture, it activates 12B parameters during inference, delivering exceptional intelligence while maintaining competitive latency. Designed for complex reasoning, multi-task agents, and long-horizon planning, gpt-oss-120b brings frontier-level capability to commercial and self-hosted deployments.
Model Specifications
| Field | Details |
|---|
| Model ID | openai/gpt-oss-120b |
| Provider | OpenAI |
| Kind | Chat / LLM |
| Architecture | Large-Scale Mixture-of-Experts (MoE) with adaptive routing, SwiGLU activations, hierarchical sparse attention, and token-choice MoE for reasoning efficiency |
| Model Size | 121.7B Params |
| Context Length | 256K Tokens |
| MoE | No |
| Release Date | August 2024 |
| License | Apache 2.0 |
| Training Data | Extensive multi-domain knowledge corpus with safety-aligned fine-tuning, enterprise & community feedback loops, and agentic task simulation datasets |
| Function Calling | Supported |
| Serverless API | Available |
| Fine-tuning | Coming Soon |
| On-demand | Coming Soon |
Pricing
Access via Qubrid’s serverless API with pay-per-token pricing. No infrastructure management required.
| Token Type | Price per 1M Tokens |
|---|
| Input Tokens | $0.15 |
| Output Tokens | $0.61 |
Quickstart
Prerequisites
- Create a free account at platform.qubrid.com
- Generate your API key from the API Keys section
- Replace
QUBRID_API_KEY in the code below with your actual key
Python
from openai import OpenAI
# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
base_url="https://platform.qubrid.com/v1",
api_key="QUBRID_API_KEY",
)
# Create a streaming chat completion
stream = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
],
max_tokens=4096,
temperature=0.7,
top_p=1,
stream=True
)
# If stream = False comment this out
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")
# If stream = True comment this out
print(stream.choices[0].message.content)
JavaScript
import OpenAI from "openai";
// Initialize the OpenAI client with Qubrid base URL
const client = new OpenAI({
baseURL: "https://platform.qubrid.com/v1",
apiKey: "QUBRID_API_KEY",
});
// Create a streaming chat completion
const stream = await client.chat.completions.create({
model: "openai/gpt-oss-120b",
messages: [
{
role: "user",
content: "Explain quantum computing in simple terms",
},
],
max_tokens: 4096,
temperature: 0.7,
top_p: 1,
stream: true,
});
// If stream = false comment this out
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
console.log("\n");
// If stream = true comment this out
console.log(stream.choices[0].message.content);
package main
import (
"bufio"
"bytes"
"encoding/json"
"fmt"
"net/http"
)
func main() {
url := "https://platform.qubrid.com/v1/chat/completions"
data := map[string]interface{}{
"model": "openai/gpt-oss-120b",
"messages": []map[string]string{
{
"role": "user",
"content": "Explain quantum computing in simple terms",
},
},
"temperature": 0.7,
"max_tokens": 4096,
"stream": true,
"top_p": 1,
}
jsonData, _ := json.Marshal(data)
req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
req.Header.Set("Content-Type", "application/json")
client := &http.Client{}
res, _ := client.Do(req)
defer res.Body.Close()
scanner := bufio.NewScanner(res.Body)
for scanner.Scan() {
line := scanner.Text()
if line != "" {
fmt.Println(line)
}
}
}
cURL
curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
-H "Authorization: Bearer QUBRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-120b",
"messages": [
{
"role": "user",
"content": "Explain quantum computing in simple terms"
}
],
"temperature": 0.7,
"max_tokens": 4096,
"stream": true,
"top_p": 1
}'
Live Example
Prompt: Explain quantum computing in simple terms
Response:
Quantum Computing in Plain English
1. Classical computers use bits — either 0 or 1 (like a light switch: off or on).
2. Quantum computers use qubits, which can be 0, 1, or both at the same time
(called superposition) — like a coin spinning in the air before it lands.
3. This lets quantum computers explore many possible answers simultaneously,
making them exponentially faster for certain problems like:
- Breaking encryption
- Drug discovery
- Optimization (logistics, finance)
- Simulating molecules
4. Entanglement: Two qubits can be linked so that the state of one instantly
affects the other — even across distances.
In short: classical computers are great at most tasks. Quantum computers are
a specialized tool for problems too complex for even the fastest supercomputers.
Playground Features
The Qubrid Playground supports advanced prompt engineering features out of the box:
🧠 System Prompt
Set a persistent instruction that shapes how the model behaves across the entire conversation.
Example: "You are a senior software engineer. Answer all questions with concise,
production-ready code and explain your reasoning step by step."
🎯 Few-Shot Examples
Guide the model by showing it example input/output pairs before your actual query — no fine-tuning needed.
| User Input | Assistant Response |
|---|
What is a closure in JS? | A closure is a function that retains access to its outer scope even after the outer function has returned... |
Explain recursion | Recursion is when a function calls itself. Base case stops the loop. Example: factorial(n) = n * factorial(n-1) |
💡 Few-shot examples are powerful for domain-specific formatting, tone control, and structured outputs — available directly in the playground UI.
Inference Parameters
| Parameter | Type | Default | Description |
|---|
| Streaming | boolean | true | Enable streaming responses for real-time output. |
| Temperature | number | 0.7 | Controls randomness. Higher values mean more creative but less predictable output. |
| Max Tokens | number | 4096 | Maximum number of tokens to generate in the response. |
| Top P | number | 1 | Nucleus sampling: considers tokens with top_p probability mass. |
| Reasoning Effort | select | medium | Controls how much reasoning effort the model should apply. |
| Reasoning Summary | select | concise | Controls the level of explanation in the reasoning summary. |
Use Cases
- Autonomous agents and multi-step reasoning
- Advanced function calling and workflow orchestration
- Research-grade problem solving and planning
- Enterprise automation across verticals
- Large-scale code generation and debugging
- R&D assistance and scientific exploration
- Conversational AI and smart copilots
- Knowledge extraction and document understanding
- Long-context business intelligence and analytics
- Custom fine-tuning for domain-specific performance
Strengths & Limitations
| Strengths | Limitations |
|---|
| High-capacity MoE design for strong reasoning and generalization | Higher compute and memory requirements compared to smaller gpt-oss models |
| Optimized activation load for high throughput (12B active parameters) | Latency may increase on single-GPU deployments |
| State-of-the-art performance under native FP4 and FP8 quantization | Fine-tuning recommended for highly specialized enterprise domains |
| Scales across multi-GPU clusters and distributed inference setups | |
| Up to 256K context window with efficient sparse attention | |
| Superior agentic and planning abilities for sequential decision tasks | |
| Built-in support for structured schema-based function calling | |
| Apache 2.0 license enabling commercial and derivative use | |
Why Qubrid AI?
- No infrastructure setup — serverless API, pay only for what you use
- OpenAI-compatible — drop-in replacement using the same SDK
- Enterprise-ready — API logs, usage tracking, and team management built in
- Multi-language support — Python, JavaScript, Go, cURL out of the box
- Fast onboarding — get your first response in under 2 minutes
Resources
Built with ❤️ by Qubrid AI
Frontier models. Serverless infrastructure. Zero friction.